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SUMMARY 

Segmentation is a fundamental problem that dominates the success of microscopic image analysis. In almost 25 years of cell detection 
software development, there is still no single piece of commercial software that works well in practice when applied to early mouse 
embryo or stem cell image data. To address this need, we developed MINS (modular interactive nuclear segmentation) as a MATLAB/ 
C-i~i-based segmentation tool tailored for counting cells and fluorescent intensity measurements of 2D and 3D image data. Our aim 
was to develop a tool that is accurate and efficient yet straightforward and user friendly. The MINS pipeline comprises three major 
cascaded modules: detection, segmentation, and cell position classification. An extensive evaluation of MINS on both 2D and 3D images, 
and comparison to related tools, reveals improvements in segmentation accuracy and usability. Thus, its accuracy and ease of use will 
allow MINS to be implemented for routine single-cell-level image analyses. 



INTRODUCTION 

Imaging of optically sectioned nuclei provides an unprec- 
edented opportunity to observe the details of fate specifi- 
cation, tissue patterning, and morphogenetic events at 
single-cell resolution in space and time. Imaging is 
now recognized as the requisite tool for acquiring infor- 
mation to investigate how individual cells behave, as 
well as the determination of mRNA or protein localiza- 
tion or levels within individual cells. To this end, fluo- 
rescent labeling techniques, using genetically encoded 
fluorescent reporters or dye-coupled immunodetection, 
can reveal the sites and levels of expression of certain 
genes or proteins during biological processes. The avail- 
ability of nuclear-localized fluorescent reporters, such as 
human histone H2B-green fluorescent protein (GFP) 
fusion proteins enables 3D time-lapse (i.e., 4D) live imag- 
ing at single-cell resolution (Hadjantonakis and Papaioan- 
nou, 2004; Kanda et al., 1998; Nowotschin et al, 2009) 
(Figures lA-lC). However, to begin to probe intrinsic 
characteristics and cellular behaviors represented within 
image data requires the extraction of quantitatively mean- 
ingful information. To do this, one should perform a 
detailed image data analysis, identifying each cell by 
virtue of a single universally present descriptor (usually 
the nucleus), obtaining quantitative measurements of 
fluorescence for each nuclear volume, and eventually 
being able of identifying the position and division of cells 



and connecting them over time for cell tracking and line- 
age tracing. 

Automated nuclear segmentation of cells grown in cul- 
ture and in early embryos is a necessary first step for a 
variety of image analysis applications in mammalian sys- 
tems. First, automated segmentation can facilitate efficient 
and accurate identification (ID) of individual cells, espe- 
cially in a context of an emergent complex tissue organiza- 
tion; for example, during tissue morphogenesis. This issue 
is exemplified by studies on early, or preimplantation, 
stages of mammalian embryo development, which result 
in the formation of a blastocyst. Mouse blastocyst develop- 
ment offers a relatively simple but relevant model for inves- 
tigating the coordination of cell lineage commitment and 
morphogenesis (Schrode et al., 2013). The blastocyst is 
also a unique stage of development when stem cells repre- 
senting each of the constituent lineages can be derived, 
propagated, differentiated, and interconverted ex vivo. 
Embryonic stem (ES) cells are well known as representative 
of the pluripotent epiblast (EPI) and are characterized by 
their ability to generate all somatic and germline lineages 
in vivo and, most likely, in vitro. Likewise, trophectoderm 
(TE) stem cells represent the trophoblast, and extraembry- 
onic endoderm stem (XEN) cells represent the primitive 
endoderm (PrE) (Artus and Hadjantonakis, 2012). Given 
the ease of in vitro culture of preimplantation embryos, 
their small size (<120 |im), and limited cell number (up to 
140 cells), they provide an attractive model for live imaging 
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the coupling of cell lineage commitment and morphogen- 
esis and can serve as a proof of principle for studies on 
larger, more developmentally advanced and complex 
mammalian embryos. 

With the increasing level complexity and detail of ana- 
lyses performed on mammalian preimplantation embryos, 
it is becoming routine to stage embryos based on total cell 
numbers rather than solely by embryonic day (E). For 
example, the blastocyst is a descriptor of a stage having a 
distinctive morphology, with an outer TE epithelial layer 
that encapsulates an inner cell mass (ICM) and a fluid-filled 
cavity (Figure ID). In the mouse, the blastocyst stage covers 
an approximately 36 hr period, from E3.0 at the initiation 
of cavitation until the time of embryo implantation into 
the maternal uterus, which occurs at around E4.5 (Rossant 
and Tam, 2009; Schrode et al., 2013). During this time, 
mouse embryos more than triple their cell number, as 
they go from around 32 cells to over 140 cells. The blasto- 
cyst stage designation is, therefore, quite broad. Indeed, it is 
now known that critical molecular changes take place 
between early blastocyst (32-cell) and late blastocyst 
(>80-cell) stages (Figure ID) (Schrode et al., 2013). One 
of the arguments made against determining total cell 
numbers in individual embryos has been the relative inef- 
ficiency of this measurement, in terms of effective auto- 
mated segmentation and/or the large amount of effort 
required for manual and semiautomated manually cor- 
rected segmentation using generic image analysis software. 
Thus, a simple universal tool able to perform this task 
would be highly desirable, not only for studies on preim- 
plantation mouse embryos but also for analyzing early 
embryos from more complex later stages or tissue samples, 
as well as other mammalian systems, including the human 
(Kuijk et al., 2012; Niakan and Eggan, 2013; Roode et al., 
2012). Since much information on preimplantation-stage 
mammalian embryos is gathered using optical sectioning, 
most frequently by confocal imaging, it is inherently 3D 
and is, therefore, amenable to nuclear segmentation for 
cell number calculations. 

Additionally, robust segmentation is requisite for proper 
quantitative analysis of individual cells within popula- 
tions. Immunostaining using antibodies directed against 
factors present in early embryos or fluorescent mRNA 
in situ hybridization experiments reveal the site of expres- 
sion of any given protein or gene but also, when combined 
with quantitative analysis, allow the determination of 
levels of expression within individual cells (G. Chia Le 
bin, S.M.-D., S. Leitch, X.L., W. Mansfield, N. Grabole, 
H. Niwa, A.K.H., and J. Nichols, unpublished data; Muiioz 
Descalzo et al., 2012). This is of particular importance 
in various contexts. For example, in preimplantation 
mammalian embryos, it is known that the levels of expres- 
sion of certain transcription factors can dictate the lineage 



choice in any given cell (Guo et al., 2010; Ohnishi et al., 
2014); however, biochemical analyses — for example, 
western blots — are limited due to the small size of embryo 
and small amount of material routinely available. 

Finally, nuclear segmentation is the first step toward the 
tracking of individual cells in situ in populations and facil- 
itates the quantitative analysis of cell cohorts over time as 
development progresses (Kang et al., 2013b; Nowotschin 
et al., 2009). Automated nuclear segmentation, subsequent 
tracking of nuclei, and the detection of cell division or cell 
death and subsequent fluorescence intensity quantitation 
are requisite for understanding the dynamic and heteroge- 
neous populations that emerge within stem cell cultures 
and in situ in embryos (Artus et al., 2013; Kang et al., 
2013a). 

Nuclear segmentation, therefore, comprises the first 
stage of any analysis involving the ID of individual cells 
on static or time-lapse 2D or 3D data generated after immu- 
nostaining or time-lapse movies of transgenic reporters 
(Kang et al., 2013b). In segmented images, the set of pixels 
belonging to each individual nucleus within a cohort of 
cells in culture or within an embryo or a tissue can be iden- 
tified (Roeder et al., 2012). Most software available with 
commercial microscope systems (for example, Zeiss ZEN, 
Leica LAS, or Perkin-Elmer Volocity) usually include stan- 
dard plug-ins that allow the user to perform basic quantita- 
tive analyses. In addition, commercial software specifically 
designed for image analysis (for example, Bitplane's Imaris 
or VSG's Amira) usually provide a user-friendly platform 
suitable for more comprehensive data analysis. Although 
these latter programs are designed for 2D/3D/4D analysis, 
their application to complex biological samples with 
high or irregular cell densities, such as mouse ES cell 
colonies or mouse embryos, usually requires various 
parameters to be optimized, and such analyses are often 
not straightforward to perform. As a consequence, generic 
programs cannot be considered for simple automated use 
and cannot be applied for routine medium- to high- 
throughput analyses of multiple samples. For this reason, 
manual segmentation is still favored in many situations, 
but this can be highly error prone and often proves too 
laborious and time consuming to be practical. 

Since nuclear segmentation is not straightforward due to 
cell deformations, irregularity in the shape and size of 
nuclei, debris from sample preparations or culture condi- 
tions, imaging artifacts, and, most noticeably, noise and 
blurring, several groups have resorted to developing their 
own methods. Many segmentation methods have been 
applied in the context of embryogenesis and cell culture 
studies. They can be categorized by their underlying image 
processing technique. Deformable models (Yu et al., 2009; 
Zanella et al., 2010) are usually computationally expensive 
and not suitable for 3D data. Blob or local maximum 
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detection (Bao et al, 2006; Keller et al, 2008, 2010) is 
computationally efficient but subject to shrinking bias, 
which technically serves the purpose of detection rather 
than segmentation. Segmentation by gradient flow 
tracking is very sensitive to object texture (Li et al., 2007). 
The Watershed method is also fast yet produces loose 
boundaries that cover the background (Fernandez et al., 
2010; Olivier et al., 2010). Discrete Markov random field 
optimization allows for incorporation of prior information 
such as shape (X.L. et al., 2012, IEEE, conference), but the 
underlying pipeline is overly complicated and, thus, 
impractical. It is important to note the growing trend of 
developing generic, trainable software frameworks that 
are based on machine learning methods that can interact 
with biologists and solve a variety of problems (Carpenter 
et al., 2006; A. Sommer et al., 2011, IEEE, symposium). 
However, the tradeoff for such generality is specificity. 
These tools usually do not capture the very essential char- 
acteristics of nuclear imaging and so do not provide a pre- 
cise analysis and quantitation. 

To meet this current need, our goal was to develop a tool 
that would make automated cell segmentation feasible and 
efficient in analyzing data from higher organisms, as has 
been applied to less complex data from lower organisms, 
such as in bacteria (Locke and Elowitz, 2009). The objective 
was to assemble a segmentation framework that is accurate 
enough to allow high-fidelity analysis over a variety of 
images while being robust enough to make it practical for 
routine use across laboratories. A major goal was that the 
software had to be a simple and intuitive application that 
could run on a desktop computer having routine process- 
ing power. Usability is a particularly important feature 
that we sought to incorporate in modular interactive 
nuclear segmentation (MINS), as this has been raised as 
an important issue for bioimaging software development 
(Carpenter et al., 2012). We wanted to allow biological re- 
searchers to analyze large 3D imaging data with only a 
few mouse clicks and minimum parameter tuning. 

Here, we report an efficient and user-friendly nuclear seg- 
mentation and quantitation framework (i.e., MINS) for the 
analysis of cohorts of cells in both stem cell cultures and 
in preimplantation stage mouse embryos. Our method 
consists of three core cascaded modules: detection, seg- 
mentation, and classification. Detection provides accurate 



localization of cell nuclei only. Segmentation expands the 
detection output to cover the full nuclear body. Finally, 
classification serves multiple purposes, including the sepa- 
ration of multiple embryos and removal of outliers, as well 
as the classification of inner and outer cells (ICM versus TE 
cells) within blastocyst-stage embryos. MINS is hosted at 
http://katlab-tools.org. 

RESULTS 

Core Algorithmic Components of MINS 

We chose specimens of increasing complexity for analysis 
using MINS software (Figure IC). Mouse XEN stem cells 
representing the PrE lineage of the blastocyst grow as 
adherent monolayer cultures (Artus et al., 2012; Kunath 
et al., 2005; Niakan et al., 2013). By contrast, mouse ES cells 
grow as adherent multilayered dome-shaped cultures 
(Nichols and Smith, 2011). Like ES cells, preimplantation 
mouse embryos comprise spatially complex cohorts of cells 
(Rossant and Tarn, 2009). A direct segmentation of every 
nucleus is computationally difficult and also error prone; 
for example, the active contour method is prone to under- 
segmentation (Roeder et al., 2012). We therefore opted to 
break down the problem into two steps: a detection module 
that identifies each nucleus followed by a segmentation 
module that propagates this ID information to the entire 
body of the respective nucleus. We also added a classifica- 
tion module for TE versus ICM cell lineage ID, which was 
based on inner versus outer cell position, respectively, 
within preimplantation mouse embryos (Figure ID). 

In brief, the core of MINS consists of three algorithmic 
components: detection, segmentation, and classification. 
Each component was devised and tailored according to 
the specific characteristics of cell culture and mouse 
embryo imaging experiments. The underlying algorithms 
are chosen so that the overall pipeline satisfies our goals: 
high performance and high usability. 
Step 1: Detection 

For the detection of nuclei, we applied the multiscale blob 
detection technique developed previously (X.L. et al., 
2012, IEEE, conference). It uses the very robust eigenvalue 
of the image Hessian matrix to identify nuclei and also uses 
scale-space analysis to suppress noise because noise does 



Figure 1. Image Analysis of Cells and Mouse Embryos and a Schematic of Preimplantation Embryo Development 

(A) Schematic showing the experimental setup used for static and live imaging of stem cell and mouse embryo specimens. Notably, samples 
are maintained in liquid culture, and images are acquired on inverted microscope systems. 

(B) Examples of imaging acquisition of 3D static immunostaining (left) or 3D live imaging of fluorescent reporter (right). 

(C) Schematic diagram showing 2D, 3D, and 4D image data acquisition and analysis. 

(D) Differential interference contrast (DIC) images of CAG:H2B-GFP transgenic fluorescent reporter expressing embryos at two-celL, 
compact morula, early, and late blastocyst stages merged with 2D and 3D renderings of GFP channel showing nuclei Labels and a schematic 
diagram of lineage specification during preimplantation development (Schrode et al., 2013). Scale bar, 20 |im. 
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Figure 2. Procedure for Nuclei Detection and ID 

(A) Users provide an input image from, for example, a mouse embryo, as shown here. 

(Bl and B2) The first (Bl) and second (B2) eigenvalues of the Hessian matrix are computed from the smoothed image at different 
scales. 

(C) A binary segmentation is obtained by thresholding the respective eigenvalues in (Bl) and (B2). 

(D) The final detection is obtained by combining the binary segmentations in (C), and each nucleus is assigned with a unique number using 
connected component analysis. 



not yield stable response across different scales, in contrast 
to real nuclei, which do. A connected component analysis 
assigns a unique ID to each nucleus for retrospective ID. 
Given a 3D image that contains bloblike objects (e.g., 
nuclei; Figure 2A), we start by smoothing the image with 
a Gaussian kernel and compute the eigenvalues at each 
pixel, which, if they are all negative, indicate that the pixel 
belongs to a local region of strong intensity, i.e., the central 
region of a nucleus (Figures 2B1 and 2B2). We then 
threshold these eigenvalues to obtain a binary mask of fore- 
ground nuclei (Figure 2C). This process is repeated using 
different kernel widths of the smoothing Gaussian kernels 
(Figure 2Ci [small]. Figure 2Cii [medium], and Figure 2Ciii 
[large]). By combing the results from all scales (a logic AND 
operation), we leverage the advantages provided by each 
size of kernel in terms of robustness against noise and 
detection accuracy. 
Step 2: Segmentation 

After ID of each nucleus, the next task is to propagate this 
ID to the entire body of a respective nucleus. We present 
three examples of this module in Figure 3 (see additional 
details in Supplemental Information available online). To 
do this, we chose seeded geodesic image segmentation 
(SGIS) as the base algorithm because of its runtime effi- 
ciency (A. Criminisi et al., 2008, European Conference on 



Computer Vision). Geodesic image segmentation applies 
a geodesic distance transform over a grid graph that repre- 
sents the image. In particular, the geodesic distance be- 
tween two nodes is the shortest path over a grid graph 
where the edges are weighted according to the continuity 
of neighboring pixels (normally based on the intensity 
gradient). Therefore, geodesic image segmentation is also 
referred to as the shortest path segmentation. Notably, 
the geodesic distance accounts for the "landscape" of the 
image; that is, the change in intensity between neigh- 
boring pixels along the path. Intuitively, two pixels will 
be considered "far" from one another if an edge (repre- 
sented by a large intensity change) exists between them. 
Assuming a homogeneous intensity distribution within a 
nucleus, this technique allows one to efficiently expand 
the detection (a.k.a. seed) to the entire body of the nucleus, 
but not beyond, because of the existence of the boundary 
of the nucleus (edge). However, hundreds of identified 
nuclei could be present in the same image; thus, SGIS 
would run on each of them sequentially. To speed up this 
procedure, we developed a parallel SGIS based on graph col- 
oring (PSGIS-GC). This propagates multiple IDs simulta- 
neously in a single SGIS run. However, if nuclei in the 
same SGIS run are proximate, then they will be merged 
into one nucleus, causing undersegmentation. This issue 
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Figure 3. Flow Diagram of Proposed Algorithm for Nuclear Segmentation 

(A) Users provide an input image from either ceLL culture imaging, in columns (Al) and (A2), or live embryo imaging, in column (A3). 

(B) Detection is performed to Locate each nucleus. 

(C) Graph coloring is used to separate proximate nuclei by assigning different colors to them. 

(D and E) Iteratively, a color is selected (D), and geodesic segmentation is called to segment the entire body of the nuclei (E). 
(F) The final segmentation is obtained by combining the segmentations from (E). Scale bar, 20 )im. 



was addressed using graph coloring, which assures that (1) 
proximate nuclei are always assigned to different SGIS runs, 
and (2) the total number of SGIS runs is minimized. This 
strategy significantly increases speed: we only need at 
most eight SGIS runs (parallelized in multicore systems) 
rather than hundreds of SGIS runs as in the naive (or 



sequential) approach, in which one has to run SGIS per 
seed against the other seeds. 
Step 3: Classification 

Once nuclei have been efficiently segmented, additional 
challenges during the analysis need to be tackled. First, 
multiple specimens (i.e., embryos) may be in the same 
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Figure 4. Multistep Classification: Multiple Embryo Extraction and Outlier Removal 

(A) The embryo separation algorithm successfully detects two embryos in (1) and five embryos in (ii). False detections from the bacl<ground 
are mistal<en for true embryonic cell nuclei (yellow arrows). 

(B) Outlier removal discards most of false detections (yellow arrows). True cell nuclei can be misclassified as outliers if they are located at 
the embryo boundary (red arrow) 

(C) Maximum intensity projections at single time points 3D time-lapse movie over (i) a 540 min period and (ii) a 1500 min period. 

(D) Performance evaluation of (i) multiple embryo extraction with outlier removal on the data set described in (C) and (ii) nuclear 
segmentation over an extended period. Scale bar, 20)im. 



image; this is especially common during 3D time-lapse 
movies where cohorts of embryos are simultaneously 
imaged (Figure 4A). Second, false detections exist due to 



noise and background disturbance (Figure 4B). Finally, as 
the cell position within the mouse embryo determines its 
developmental outcome, cells need to be classified into 
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either inner cells within the specimen, which constitute 
ICM and contribute to the embryo proper, or outer cells, 
which are allocated to the TE lineage (Figure 5). 
Embryo Separation 

We followed a clustering-based approach to extract multi- 
ple embryos in the same image (Figure 4A). This problem 
becomes difficult when false detections emerge from a 
noisy background (Figure 4B). Therefore, local distance- 
based clustering techniques such as k-means or spectral 
clustering are inappropriate. Instead, we used mean-shift 
(Comaniciu and Meer, 2002) which is a mode-seeking 
method that fits a "template" (a kernel) to the image. In 
our case, the kernel is a Gaussian, and the key is noticing 
that the width of the kernel naturally corresponds to the 
embryo size. 
Outlier Removal 

Outlier removal is performed per embryo using a robust 
shape-fitting approach and ellipse as the underlying shape 
model. Our approach consists of two ingredients: random 
sample consensus (referred to as RANSAC; Fischler and 
BoUes, 1981) and 2D/3D ellipse fitting (see Supplemental 
Information for details). 
ICM/TE Classification in Blastocysts 

After removal of outliers, the remaining nuclei are to be 
classified into either ICM or TE cells. Briefly, we fitted an 
ellipsoidal model to the detections, and this ellipse essen- 
tially describes the surface of the embryo. Considering 
this ellipsoid as a function, after fitting, the surface of the 
ellipsoid is of value 1 and the center of the ellipsoid 
(i.e., the embryo) is of value 0. Therefore, ICM cells are 
those whose center is of a value lower than 0.95, while 
the rest are considered TE cells. 

Performance Evaluation 

Segmentation Accuracy 

To judge whether a segmented nucleus is truly meaningful, 
we followed the following criterion throughout evaluation: 
a segmented nucleus is considered meaningful if the 
automated segmentation has at least 75% overlap with 
the manual segmentation. This percentage overlap pro- 
vides a rough estimate to justify whether a segmented 
nucleus is quantitatively useful, which mainly takes into 
account its size. Moreover, this threshold is expected to 
be sufficient for resolving phenotypes among various 
embryonic genotypes, even though a sufficient number 
of differently genetically manipulated embryos will, in 
practice, be necessary for a detailed analysis. We performed 
a manual evaluation by first overlaying the MINS output 
on top of the raw image and then recording all errors 
(missing, false-positive, etc.), which were used to compute 
the following basic metrics: number of segmented nuclei, 
Nseg', number of true segmentation, Njp (i.e., true-positive); 
number of false segmentation, Npp (i.e., false-positive); and 



number of missing nuclei, JV^jv (i.e., false-negative). Given 
the true number of nuclei, Nmie, we further compute preci- 
sion [ = NTp/{NTp + Nfp)], recall {=NTp/NTnie), and f score 
(which equals the harmonic mean of precision and recall). 
The results on 2D and 3D data sets are shown in Table SI 
and Table S2. 
Multiple Embryo Extraction 

We evaluated our multiple embryo extraction approach on 
two types of data sets. The first type contains two embryos 
of large size (radius, 120 pixels) and has a clean background 
(Figure 4Ai). The second type is more challenging, with five 
embryos and a significant background structure (Fig- 
ure 4Aii). In each image, segmented nuclei are highlighted 
with its embryo ID. Although all embryos are successfully 
separated, not all resulting embryos are clean because of 
false detections from the background structure (yellow 
arrows in Figure 4). We addressed this issue in the next 
outlier removal step. 
Outlier Removal 

We evaluated our outlier removal approach on 2D and 
3D data. As shown in Figure 4, outliers are marked "O" 
in yellow. Our approach can separate true cell nuclei 
from false detections. Direct comparison between Fig- 
ure 4A and Figure 4B immediately indicates the improve- 
ment. Occasionally, true cell nuclei are misclassified as 
outliers if they are located at the embryo boundary (red 
arrow in Figure 4Bii). Overall, application of embryo 
extraction and outlier removal successfully discarded false 
detections. Figure 4D quantitatively shows the step-by- 
step effect on eight images (Figure 4C) from 4D mouse 
embryo image data. 
TE versus ICM Cell Classification 

We evaluated our TE/ICM classification approach on four 
data sets with different density and embryo shape (Fig- 
ure 5D). Quantitatively, we achieved an average accuracy 
of 93.30 ±4.64 (n = 3). 

Data Interfaces and Graphical User Interface 

Data Import 

To be compatible with the common formats provided by 
major imaging vendors, we used Bio-Formats as a Java li- 
brary that reads most common file formats (Linkert et al., 
2010). 

Graphical User Interface 

We designed a straightforward graphical user interface 
(GUI) that guides users through the entire processing pipe- 
line (Figures 6A-6C). After each step, users can visualize the 
result and decide whether they want to proceed to the next 
step or refine the current one. Users are also allowed to save 
or load their specific parameterization at any given stage. 
The interface only requires four key parameters from the 
users: (1) average nuclear radius, (2) noise threshold, 
(3) embryo diameter, and (4) number of embryos for 
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classification. We also implemented batch processing to 
automatically process large quantities of data sets using 
the same parameter set. See the detailed parameter selec- 
tion guide in the Supplemental Information. 
Export of Results 

The segmentation and classification results are exported 
into images and a table. The segmentation image and over- 
lay on the raw data (TIFF format) allows for simple inspec- 
tion, validation of the results, and analysis using other 
software packages. The table provides a detailed summary 
of basic characteristics of each of the segmented nuclei 
(with information on size, position, etc.), quantitation 
(sum/average of fluorescence intensity from all channels 
in the data set), and classification (the embryo it belongs 
to, outlier or not, TE or ICM cell). MINS treats interphase, 
mitotic, and apoptotic nuclei equally, but using the output 
table, one can easily discern distinct types of nuclei. High- 
fluorescence intensity is usually associated with mitosis, 
while multiple objects with a reduced size are indicative 
of apoptotic bodies (Figure SI). 

We and others have used the framework on various types 
of image data sets (2D and 3D) where it has accelerated 
phenotypic analysis (Le Bin et al., 2014). MINS software 
and a detailed user guide are hosted at http://katlab-tools. 
org for implementation by any users. 

Comparison of MINS to Semiautomated Methods and 
Related Software Tools 

We compared MINS against several popular tools in 
the community, including ilastik (http://www.ilastik. 
org), FARSIGHT (http://www.farsight-toolkit.org/wiki/ 
FARSIGHT_Toolkit), and CellSegmentationSD (http:// 
www.biomedcentral.com/14 7 1-2 1 2 1/8/40/additional) . We 
tried multiple parameter sets for FARSIGHT and chose the 
best result. For the machine learning-based ilastik, no 
parameter tuning is required, but a training data set has 
to be created; this took 10 min. For our tool, after two 
rounds of parameter tuning, the result was satisfactory. 
For CellSegmentationSD, unfortunately after several round 
of trials, we did not manage to run through due to the 
limitation of the hardware to complete the processing. 
The top series of panels from Figure 6D depict the segmen- 
tation achieved by these tools on a typical 3D mouse blasto- 
cyst-stage embryo data set. It appears that both FARSIGHT 
and ilastik have difficulties segmenting the dense nuclear 
region, which corresponds to cells of the ICM. In addition, 
with FARSIGHT, the segmented boundary appears more 



rectangular, and it also erroneously splits some nuclei. On 
the other hand, ilastik shows significant undersegmenta- 
tion and produces nuclei with holes inside. MINS appears 
more robust in dealing with those issues and achieves an 
accuracy of approximately 95%. By contrast, we found 
that the accuracy for FARSIGHT and ilastik was below 
85%. MINS has been designed to serve the specific needs 
of the preimplantation embryo analysis beyond the basic 
segmentation function, such as embryo separation and 
cell classification. However, the quality of MINS data anal- 
ysis is absolutely dependent on the quality of raw data itself. 
MINS shows decreases in the accuracy during the analysis of 
more developed or complex structures, such as embryos 
with more than 200 cells (Figure 5D). Therefore, it is impor- 
tant to apply MINS in various biological images or speci- 
mens to improve the algorithms. 

Application of MINS Software for Segmentation and 
Quantitative Fluorescence Measurements 

There is an imperative and growing need for quantitative 
analysis of fluorescently labeled cells within early embryos, 
as well as in stem cell populations. Our results suggest that 
MINS will accelerate this process. This is summarized in the 
following three practical applications. 

First, we applied MINS on ES cell populations that have 
been cultured under different conditions (Figure 7 A). It is 
well established that mouse ES cells grown in standard 
conditions (serum and leukemia inhibitory factor [LIF]) 
express the stem cell-associated transcription factor 
NANOG in a heterogeneous manner, reflecting the poten- 
tial of cells within the population to remain pluripotent or 
to differentiate (Chambers et al., 2007; Kalmar et al., 
2009). Using MINS, we measured the level of the pluripo- 
tency-associated factor NANOG after fluorescent immnu- 
nostaining. The measurements using MINS revealed a 
heterogeneous pattern of NANOG expression. When LIF 
is withdrawn from the culture media, ES cells are more 
prone to differentiate and thus express lower levels of 
NANOG. For these cells, MINS provided us with values 
indicative of their low NANOG expression status. Finally, 
it has been demonstrated that, by adding two signaling in- 
hibitors (2i), stem cells are locked into a more homoge- 
neous state of naive (or ground-state) pluripotency and 
express elevated levels of NANOG (Ying et al., 2008). 
Once again, analysis with MINS revealed that cells grown 
in these "21" conditions expressed high levels of this tran- 
scription factor (Figure 7A). 



Figure 5. Multistep Classification: TE versus ICM Classification 

(A) Schematic of TE versus ICM Lineage allocation in preimplantation mouse embryo. 

(B) Preimplantation embryos over various stages immunostained with TE (CDX2, green) and ICM (NANOG and GATA6, magenta) markers. 

(C) Schematic diagram of TE versus ICM classification procedure by MINS. 

(D) Performance evaluation of lineage classification. Scale bar, 20 |im. 
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Figure 6. Overview of the MINS Platform and 
Its Comparison with Related Tools 

(A) The main GUI of MINS. The top boxes contain 
functions for parameter Loading and saving. The 
middle boxes correspond to the entire processing 
pipeline. The bottom boxes allow batch process- 
ing on a large number of data sets. 

(B) The processing pipeline and the output of 
each modules. 

(C) Detailed outputs ease any downstream ana- 
lyses, either manually or by integration with other 
software tools. Overlay of segmentation and raw 
data allow rapid and straightforward inspection of 
the results. A segmentation information summary 
provides easy access to quantitation results. 

(D) Top: volume rendering of a raw 3D CAG:H2B- 
GFP mouse embryo data set and the segmentation 
output generated by FARSIGHT, iLastil<, and I^IINS. 
For each segmentation, each segmented object 
is assigned a unique color descriptor. Bottom: 
visualization of a 2D section of the same data set. 
Scale bar, 20 |.im. 
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We also evaluated the performance of MINS software 
in the analysis of early mouse embryos that had been 
fixed and immunostained for transcription factors associ- 
ated with the two ICM lineages (Figure ID), namely, the 
EPI and PrE. These two cell populations originate from a 
common pool of precursor cells within the ICM (Fig- 
ure ID). In approximately 100-cell-stage embryos, these 
lineages are segregated and are distinctly identified by 
the expression of NANOG in the EPI and GATA6 in the 
PrE (Chazaud et al., 2006; Plusa et al., 2008). Running 
MINS on fixed and stained 100-cell- stage embryo pro- 
vided fluorescent measurement values that were indica- 
tive of two discrete populations (Figure 7B): one specified 
for EPI (NANOG'^'; GATA6'°™) and another for PrE 
(NANOG'"™; GATA6'''). Notably, the MINS analysis also 
allowed the unexpected ID of an unspecified cell that 
had not committed to the EPI or PrE lineages and that 
expressed both NANOG and GATA6 at comparable levels 
at this relatively late stage. While developing an under- 
standing of this observation is outside the scope of the 
present study, it demonstrates the power of such a large- 
scale (hundreds to thousands of cells analyzed from 
tens to hundreds of embryos) single-cell resolution data 
analysis in revealing detailed information that is critical 
in the formulation of a mechanistic understanding of a 
process. 

Finally, we applied MINS to analyze fluorescence inten- 
sity levels of cells in the ICMs of embryos cultured ex utero 
and imaged in 3D time lapse (i.e., 4D). This type of live 
imaging allows the visualization of the dynamics of line- 
age-specific gene expression. For this analysis, we used 
embryos carrying a nuclear fluorescent reporter cassette 
(H2B-GFP) targeted in the locus encoding the platelet- 
derived growth factor receptor alpha (PDGFRa), a marker 
for the PrE lineage (Plusa et al., 2008). We previously 
described that dynamic and heterogeneous populations 
with respect to Pdgfra'^^'*"'^'^'' expression emerge; specif- 
ically, GFP-positive cells were initially positioned randomly 
within the ICM and are then sorted forming the epithelial 
PrE layer facing the blastocyst cavity (Kang et al., 2013a; 
Plusa et al., 2008). Moreover, as embryos developed and 
the PrE lineage is formed, GFP expression increased. 
Previously, this type of analysis was performed by manual 
or semiautomated nuclear segmentation using commer- 
cially available software and subsequent fluorescent quan- 
titation measurements (Kang et al., 2013a). Consequently, 
this process was labor intensive, taking a total of between 
15 and 20 hr to complete for the movie illustrated in 
Figure 7C. Conversely, by applying MINS to the same 
data, we found that comparable results were obtained, 
and notably, a significantly reduced amount of time was 
needed for the completion of the analysis (approximately 
2 hr; Figure 7C). 



DISCUSSION 

Motivated by the increasing need for quantitative analysis 
of image data from preimplantation mouse embryos for 
staging and phenotyping (Miyanari and Torres-Padilla, 
2012; Plusa et al., 2008), we sought to develop a software 
tool that would allow simple rapid high-throughput semi- 
automated nuclear segmentation of image data varying 
from 2D to more complex 3D data sets. Here, we report 
the development of a tool that has been specifically trained 
and tailored for use on 3D preimplantation mouse embryo 
and stem cell data, cell number calculation for embryo stag- 
ing, quantitative fluorescence at a single-cell level, and po- 
sitional classification and nuclear size. Our framework 
achieves a balance between computational complexity 
and runtime. We use basic, simple operations that are suit- 
able for the detection of nuclei and segmentation. In addi- 
tion, we used parallel computing to fully exploit the poten- 
tial of the modern multicore computer systems. This 
enables a 2D image to be processed in less than 10 s and a 
normal 3D image (for example, 512 x 512 x 100) to be pro- 
cessed in less than 3 min on a workstation such as an Intel 
Xeon, quad core, 2.4G Hz. It is important to note that batch 
processing allows the unsupervised processing of several 
data sets with the same settings. 

More important, we have attempted to make the soft- 
ware user friendly and intuitive to use. MINS does 
not require users to create a pipeline comprising basic 
modules as in CellProfiler. Similarly, there is no need for 
any extra effort on training data as is the case for ilastik. 
Furthermore, MINS software can be easily coupled with 
other tools for downstream advanced visualization or 
analysis. 

The applications of MINS software are multiple, as it 
allows single-cell measurements of confocal images. These 
include: (1) determination of cell number that, when 
applied to preimplantation mouse embryos, gives precise 
developmental staging; (2) quantitative documentation 
of nuclear size that can be used to monitor changes in 
size as development or differentiation progresses; (3) fluo- 
rescent intensity measurements that can be used as 
readouts of concentrations of specific proteins, in immuno- 
fluorescence data, or promoter activity, in fluorescent 
reporter-expressing cells or embryos; and (4) cell lineage 
allocation for mouse preimplantation embryos. It should 
be noted that, when quantitative fluorescent intensities 
are being determined, additional considerations should 
be taken into account during the acquisition of images 
and processing of the data. These include normalization, 
intensity loss compensation of, for example, confocal 
images, and background subtraction. These are nontrivial 
issues that will be addressed as the software is implemented 
and feedback is provided by the community. 
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It should be noted that we were interested in developing 
a tool, which, as a first step, would segment relatively sim- 
ple stem cell or mammalian early embryo data sets. Thus 
far, we have only tested it on stem cell and preimplantation 
embryo samples. In principle, the software should be appli- 
cable to other types of data; however, we have not yet opti- 
mized the software for these data. Since the software will be 
freely available to download online, other researchers 
interested in using MINS for image analysis of their own 
biological samples will have the opportunity to test it and 
provide feedback. 

Furthermore, although the design of MINS assumes a 
certain shape of nuclei (sphere or oval), segmentation by 
MINS is not limited to nuclei with those conformations. 
MINS accurately detects nuclei with condensed chromo- 
somes or dividing nuclei during metaphase (Figure SI, 
yellow arrowhead). However, MINS does not distinguish 
these nuclei from interphase nuclei during the detection 
process; to this end, an increased signal intensity of these 
nuclei correctly reflects their mitotic status. In addition, 
apoptotic events, as recognized by nuclear debris, can occa- 
sionally be detected by MINS having a significantly lower 
volume (Figure SI, red arrowhead) compared to properly 
shaped nuclei (Figure SI, white arrowhead). Had we incor- 
porated the ID of fragmenting nuclei, we would have 
reduced the efficiency of segmentation leading to overseg- 
mentation of data. 

Looking to the future, we envisage two key directions for 
improving the performance of the MINS software, as well 
as its availability to end users. First, user editing should 
be integrated within the pipeline, so that specific errors 
can be corrected after each step of the pipeline. For 
example, in the current version of the software, an error 
from the detection step cannot be fixed by the segmenta- 
tion step, and user correction is needed. Second, in the 
future, the migration of the MINS software to the freely 
available python platform (http://www.python.org) will 



be important. This improvement will allow independence 
from the commercial MATLAB environment, making 
the software more readily accessible to a wider audience 
of users. 

EXPERIMENTAL PROCEDURES 

Stem Cells and Mouse Strains 

CAG:H2B-GFP mES cells have been described previously (Hadjan- 
tonakis and Papaioannou, 2004). XEN cells hemizygous for the 
CAG:H2B-GFP transgene were derived from CAG:H2B-GFP mouse 
embryos using standard protocols (Kunath et al., 2005; Niakan 
et al., 2013). Embryos were collected from CD-I (Charles River) 
or CAG:H2B-GFP strains of mice (Hadjantonakis and Papaioannou, 
2004). Additional details on ES and embryo culture and imaging 
are provided in the Supplemental Information. Mouse husbandry 
and all experiments were performed in accordance with Memorial 
Sloan Kettering Cancer Center Institutional Animal Care and Use 
Committee-approved protocols. 

Software Implementation Details and Availability 

MINS was implemented using a combination of MATLAB and C-n-. 
MATLAB serves as the high-level glue language that provides the 
GUI and also for construction of the overall pipeline. C-n-, on 
the other hand, was used to Implement the underlying algorithms 
for better computational efficiency. All core algorithmic compo- 
nents are Implemented in C++ and invoked in MATLAB as func- 
tions. Furthermore, some algorithms are paralleled including the 
PSGIS algorithm. The implementation has GUI support and is 
available to Interested users. Additional technical details on the 
algorithms supporting MINS are provided in the Supplemental 
Information. MINS software and a detailed user guide are hosted 
at http://katlab-tooIs.org. 

Currently, MINS runs on a PC with 64-blt Windows OS. 
Necessary supporting software includes MATLAB with the 
Image Processing and Statistics Toolboxes. Java Runtime Environ- 
ment is also required. For segmenting large 3D data, we used an 
Intel Xeon Processor E5530 Quad Core 2.40 GHz with 24G 
memory. 



Figure 7. Application of MINS for Quantitative Fluorescent Measurements 

(A) Nuclear segmentation and quantitative fluorescent measurements of mouse ES cells grown under different conditions that have been 
stained for the pluripotency-associated factor NANOG. Stem cells display a heterogeneous pattern of NANOG expression when grown under 
standard serum + LIF conditions (left column) but either downregulate its expression when LIP is absent (middle column) or markedly 
increase its expression in the presence of the 21 inhibitors (right column). Quantitative fluorescent measurements via MINS are indicative 
of the culture conditions used (scatterplot at right). 

(B) Nuclear segmentation and quantitative immunofluorescence of a lOO-cell-stage embryo stained for the EPI-specific factor NANOG and 
the PrE-specific factor GATA6. There are two distinct population of cells within the ICM, either expressing high levels of GATA6 (blue 
arrowheads) or high levels of NANOG (red arrowheads). A single cell expresses similar levels of NANOG and GATA6 and could thus be 
categorized as unspecified for either EPI or PrE (green arrowhead). Quantitative fluorescent measurements via MINS are indicative of the 
two distinct cell populations within the ICM (scatterplot at the right). 

(C) Comparison of quantitative analysis of 3D time-lapse imaging data performed either manually or with MINS software on embryos 
carrying the Pdgfra^^^''''''' reporter cultured ex utero. Select single time points from a representative 3D time-lapse movie are shown on the 
top row. GFP intensities of individual cells identified in each embryo at selected time points are shown at the scatterplot to the bottom. 
Scale bar, 20 |rm. 
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Supplemental Information for this article includes one figure, two 
tables, seven movies, and Supplemental Experimental Procedures 
and can be found with this article online at http;//dx.doi. org/10. 
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